Bounding and Comparing Methods for Correlation Clustering Beyond ILP

نویسندگان

  • Micha Elsner
  • Warren Schudy
چکیده

We evaluate several heuristic solvers for correlation clustering, the NP-hard problem of partitioning a dataset given pairwise affinities between all points. We experiment on two practical tasks, document clustering and chat disentanglement, to which ILP does not scale. On these datasets, we show that the clustering objective often, but not always, correlates with external metrics, and that local search always improves over greedy solutions. We use semi-definite programming (SDP) to provide a tighter bound, showing that simple algorithms are already close to optimality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A flexible ILP formulation for hierarchical clustering

Hierarchical clustering is a popular approach in a number of fields with many well known algorithms. However, all existing work to our knowledge implements a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as li...

متن کامل

Fuzzy clustering of time series data: A particle swarm optimization approach

With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...

متن کامل

Comparing Model-based Versus K-means Clustering for the Planar Shapes

‎In some fields‎, ‎there is an interest in distinguishing different geometrical objects from each other‎. ‎A field of research that studies the objects from a statistical point of view‎, ‎provided they are‎ ‎invariant under translation‎, ‎rotation and scaling effects‎, ‎is known as the statistical shape analysis‎. ‎Having some objects that are registered using key points on the outline...

متن کامل

بررسی تأثیر نرم کننده پرانرژی مایع یونی بر پایه ایمیدازولیوم بر خواص حرارتی نیتروسلولز

In this paper investigates an energetic imidazolium ionic liquid plasticizer (ILP)  effect on the degradation kinetics of nitrocellulose, which is a important component of double based solid propellants. For better comparison and evaluation, diethyl phthalate (DEP) plasticizer, which has a structure similar to ILP, was also evaluated. Heat of combustion analysis was performed to evaluate the en...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009